Batch 2 - Class 264 - Introduction to Data Science
Zoom: send meeting Id and password
Start recording
Preclass Exercise:
Can you tile a 6x6 grid with 2x1 dominos, so that there are no fault lines (i.e. any horizontal or vertical line has at least one dominos going across it?)
Answer: Can't be done - each of 10 lines in 6x6 grid divides the region into two even areas, so at least two dominoes must cross it. So there must be at least 20 dominoes, but there are only 18
What is Data Science: Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from large volumes of data in various forms, either structured or unstructured
John Snow 1854 Cholera Map
Explain the problem John Snow was encountering - of finding out what led to spread of Cholera. At that time, it was believed that Cholera was caused by pollution or "bad air". The germ theory was not yet created.
How should one explain the deaths away from the specific pump?
How should one explain people who are near the pump but didn't contract cholera?
"Friends" Experiment
Give a set of flashcards to each student, and ask them to rate 1-5 on whether they would befriend a particular person or not (1 is definitely no, and 5 is definitely yes). Let them record on their notebook - given them 5 minutes to do so.
Enter into each column for a kid
Draw the first pivot to total a given kids propensity to make friends using training set (KidName in column label, as well as Count of Kids in Values - Can use %)
How decisive is this kid in making a friend decision?
Draw a graph to illustrate
Draw a diagram to show what kind of friends a particular person makes - say male or female
Introduce notion of features
Currently the features that have been recorded
How could some of the features be interpreted differently - hobby as artistic or sporty, versus outdoor or indoor? How can that change conclusions?
Draw a graph
Take all parameters to see which are most differentiating ones
Which of the characters have most friends
Lets build a predictor
For a kid, take all the differentiating factor - more than a 60:40 difference, between 4,5 values versus 1,2,3
Record the same on whiteboard for reference
Now take their test data, and try to predict - see how many predictions come out to be true
Discuss why a predictor might or might not work well
Homework:
Kids should complete their own exercise, and figure out the predictor using training data. Then test their predictor against the test data.